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Abstract 

We  describe  a  first  experiment  in  evaluating  the  system 
capabilities  of  the  Battlefield  Augmented  Reality  System,  an 
interactive  system  designed  to  present  military  information 
to  dismounted  warfighters.  We  describe  not  just  the  current 
experiment,  but  a  methodology  of  both  system  evaluation 
and  user  performance  measurement  in  the  system,  and  show 
how  both  types  of  tests  will  be  useful  in  system  development. 
We  summarize  results  in  a  perceptual  experiment  being  used 
to  inform  system  design,  and  discuss  ongoing  and  future 
experiments  to  which  the  work  described  herein  leads. 


1  Introduction 

One  of  the  most  challenging  aspects  of  the  design  of 
intelligent  systems  is  the  user  interface  -  how  the  user 
will  perceive  and  understand  the  system.  Our  application 
presents  military  information  to  a  dismounted  warfighter. 
In  order  to  both  refine  the  system’s  capabilities  and  im¬ 
prove  the  warfighter’s  performance  of  tasks  while  using  the 
system,  we  measure  human  performance  using  our  system, 
even  while  early  in  the  design  phase  of  the  user  interface. 
This  paper  describes  an  early  experiment  in  the  context  of 
system  evaluation  and  describes  implications  for  both  sys¬ 
tem  and  human  performance  metrics  as  they  apply  to  such 
systems. 

1.1  Application  context 

Military  operations  in  urban  terrain  (MOUT)  present 
many  unique  and  challenging  conditions  for  the  warfighter. 
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The  environment  is  extremely  complex  and  inherently 
three-dimensional.  Above  street  level,  buildings  serve  vary¬ 
ing  purposes  (such  as  hospitals  or  communication  stations). 
They  can  harbor  many  risks,  such  as  snipers  or  mines, 
which  can  be  located  on  different  floors.  Below  street  level, 
there  can  be  an  elaborate  network  of  sewers  and  tunnels. 
The  environment  can  be  cluttered  and  dynamic.  Narrow 
streets  restrict  line  of  sight  and  make  it  difficult  to  plan  and 
coordinate  group  activities.  Threats,  such  as  snipers,  can 
continuously  move  and  the  structure  of  the  environment  it¬ 
self  can  change.  For  example,  a  damaged  building  can  fill 
a  street  with  rubble,  making  a  once-safe  route  impassable. 
Such  difficulties  are  compounded  by  the  need  to  minimize 
the  number  of  civilian  casualties  and  the  amount  of  damage 
to  civilian  targets. 

In  principle,  many  of  these  difficulties  can  be  overcome 
through  better  situation  awareness.  The  Concepts  Division 
of  the  Marine  Corps  Combat  Development  Command  (MC- 
CDC)  concludes  [2]: 

Units  moving  in  or  between  zones  must  be  able 
to  navigate  effectively,  and  to  coordinate  their  ac¬ 
tivities  with  units  in  other  zones,  as  well  as  with 
units  moving  outside  the  city.  This  navigation 
and  coordination  capability  must  be  resident  at 
the  very-small-unit  level,  perhaps  even  with  the 
individual  Marine. 

A  number  of  research  programs  have  explored  the  means 
by  which  navigation  and  coordinated  information  can  be  de¬ 
livered  to  the  dismounted  warfighters.  We  believe  a  mobile 
augmented  reality  system  best  meets  the  needs  of  the  dis¬ 
mounted  warfighter. 

1.2  Mobile  Augmented  Reality 

Augmented  reality  (AR)  refers  to  the  mixing  of  virtual 
cues  from  the  real  three-dimensional  environment  into  the 
user’s  perception.  In  this  work,  AR  denotes  the  3D  merging 
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Figure  1.  A  sample  view  of  our  system,  showing 
one  physicaliy  visible  building  with  representa¬ 
tions  of  three  buiidings  which  it  occiudes. 


of  synthetic  imagery  into  the  user’s  natural  view  of  the  sur¬ 
rounding  world,  using  an  optical,  see-through,  head-worn 
display. 

A  mobile  augmented  reality  system  consists  of  a  com¬ 
puter,  a  tracking  system,  and  a  see-through  HMD.  The  sys¬ 
tem  tracks  the  position  and  orientation  of  the  user’s  head 
and  superimposes  graphics  and  annotations  that  are  aligned 
with  real  objects  in  the  user’s  field  of  view.  With  this 
approach,  complicated  spatial  information  can  be  directly 
aligned  with  the  environment.  This  contrasts  with  the  use  of 
hand-held  displays  and  other  electronic  2D  maps.  With  AR, 
for  example,  the  name  of  a  building  could  appear  as  a  ’’vir¬ 
tual  sign  post”  attached  directly  to  the  side  of  the  building. 
To  explore  the  feasibility  of  such  a  system,  we  are  develop¬ 
ing  the  Battlefield  Augmented  Reality  System  (BARS).  Fig¬ 
ure  1  is  an  example  from  BARS.  This  system  will  network 
multiple  dismounted  warfighters  together  with  a  command 
center. 

Through  the  ability  to  present  direct  information  over¬ 
lays,  integrated  into  the  user’s  environment,  AR  has  the  po¬ 
tential  to  provide  significant  benefits  in  many  application 
areas.  Many  of  these  benefits  arise  from  the  fact  that  the  vir¬ 
tual  cues  presented  by  an  AR  system  can  go  beyond  what 
is  physically  visible.  Visuals  include  textual  annotations, 
directions,  instructions,  or  “X-ray  vision,”  which  shows  ob¬ 
jects  that  are  physically  present,  but  occluded  from  view. 
Potential  application  domains  include  manufacturing  [1], 
architecture  [20],  mechanical  design  and  repair  [7],  medical 
applications  [4,  17],  military  applications  [11],  tourism  [6], 
and  interactive  entertainment  [19]. 


1.3  Performance  Measurement  in  BARS 

BARS  supports  information  gathering  and  human  navi¬ 
gation  for  situation  awareness  in  an  urban  setting  [11].  A 
critical  aspect  of  our  research  methodology  is  that  it  equally 
addresses  both  technical  and  human  factors  issues  in  field¬ 
ing  mobile  AR.  AR  system  designers  have  long  recognized 
the  need  for  standards  for  the  performance  of  AR  technol¬ 
ogy.  As  the  technology  begins  to  mature,  we  and  some  other 
research  groups  are  also  considering  how  to  test  user  cogni¬ 
tion  when  aided  by  AR  systems. 

We  determined  the  task  in  which  to  measure  perfor¬ 
mance  first  through  consultation  with  domain  experts  [9]. 
They  identified  a  strong  need  to  visualize  the  spatial  lo¬ 
cations  of  personnel,  structures,  and  vehicles  occluded  by 
buildings  and  other  urban  structures  during  military  oper¬ 
ations  in  urban  terrain.  While  we  can  provide  an  over¬ 
head  map  view  to  view  these  relationships,  using  the  map 
requires  a  context  switch.  We  are  designing  visualization 
methods  that  enable  the  user  to  understand  these  relation¬ 
ships  when  directly  viewing,  in  a  heads-up  manner,  the  aug¬ 
mented  world  in  front  of  them. 

The  perceptual  community  has  studied  depth  and  lay¬ 
out  perception  for  many  years.  Cutting  [3]  divides  the  vi¬ 
sual  field  into  three  areas  based  on  distance  from  the  ob¬ 
server;  near-held  (within  arms  reach),  medium-held  (within 
approximately  30  meters),  and  far-held  (beyond  30  meters). 
He  then  points  out  which  depth  cues  are  more  or  less  ef¬ 
fective  in  each  held.  Occlusion  is  the  primary  cue  in  all 
three  spaces,  but  with  the  AR  metaphor  and  the  optical  see- 
through,  this  cue  is  diminished.  Perspective  cues  are  also 
important  for  far-held  objects,  but  this  assumes  that  they 
are  physically  visible.  The  question  for  an  AR  system  is 
which  cues  work  when  the  user  is  being  shown  virtual  rep¬ 
resentations  of  objects  integrated  into  a  real  scene. 

Our  immediate  goal  is  thus  to  determine  methods  that 
are  appropriate  for  conveying  depth  relationships  to  BARS 
users.  This  requires  measurement  of  the  system’s  perfor¬ 
mance  in  presenting  information  that  feed  the  users’  per¬ 
ceptions  of  the  surrounding  environment.  Then,  we  need  to 
establish  a  standard  for  warhghter  performance  in  the  task 
of  locating  military  personnel  and  equipment  during  an  op¬ 
eration  in  urban  terrain.  For  example,  one  goal  of  our  work 
is  to  determine  how  many  depth  layers  a  user  can  under¬ 
stand. 

2  Related  Work 

2.1  Perceptual  Measures  in  AR  Systems 

A  number  of  representations  have  been  used  to  convey 
depth  relationships  between  real  and  virtual  objects.  Partial 
transparency,  dashed  lines,  overlays,  and  virtual  cut-away 
views  all  give  the  user  the  impression  of  a  difference  in  the 
depth  [7,  16,  20,  12]. 


Furmanski  et  al.  [8]  utilized  a  similar  approach  in  their 
pilot  experiment.  Using  video  AR,  they  showed  users  a 
stimulus  which  was  either  behind  or  at  the  same  distance 
as  an  obstructing  surface.  They  then  asked  users  to  identify 
whether  the  stimulus  was  behind,  at  the  same  distance  as, 
or  closer  than  the  obstruction.  The  performance  metric  here 
is  thus  an  ordinal  depth  measure.  Only  a  single  occluded 
object  was  present  in  the  test.  The  parameters  in  the  pilot 
test  were  the  presence  of  a  cutaway  in  the  obstruction  and 
motion  parallax.  The  presence  of  the  cutaway  significantly 
improved  users’  perceptions  of  the  correct  location  when 
the  stimulus  was  behind  the  obstruction.  The  authors  of¬ 
fered  three  possible  locations  to  the  users,  even  though  only 
two  locations  were  used.  Users  consistently  believed  that 
the  stimulus  was  in  front  of  the  obstruction,  despite  the  fact 
that  it  was  never  there. 

Ellis  and  Menges  [5]  found  that  the  presence  of  a  visible 
(real)  surface  near  a  virtual  object  significantly  influences 
the  user’s  perception  of  the  depth  of  the  virtual  object.  For 
most  users,  the  virtual  object  appeared  to  be  nearer  than  it 
really  was.  This  varied  widely  with  the  user’s  age  and  abil¬ 
ity  to  use  accommodation,  even  to  the  point  of  some  users 
being  influenced  to  think  that  the  virtual  object  was  fur¬ 
ther  away  than  it  really  was.  Adding  virtual  backgrounds 
with  texture  reduced  the  errors,  as  did  the  introduction  of 
virtual  holes,  similar  to  those  described  above.  Rolland  et 
al.  [13]  found  that  occlusion  of  the  real  object  by  the  vir¬ 
tual  object  gave  the  incorrect  impression  that  the  virtual  ob¬ 
ject  was  in  front,  despite  the  object  being  located  behind 
the  real  object  and  other  perceptual  cues  denoting  this  rela¬ 
tionship.  Further  studies  showed  that  users  performed  bet¬ 
ter  when  allowed  to  adjust  the  depth  of  virtual  objects  than 
when  making  forced-choice  decisions  about  the  objects’  lo¬ 
cations  [14]. 

2.2  Cognitive  Measures  in  AR  Systems 

There  have  been  few  user  studies  conducted  with  AR 
systems;  most  such  studies  (including  ours)  have  been  at 
the  perceptual  level,  such  as  those  described  above.  The 
recent  emergence  of  hardware  capable  of  delivering  suffi¬ 
cient  performance  to  achieve  stable  presentation  of  graph¬ 
ics  does  enable  such  studies,  however.  One  example  of  a 
cognitive-level  study  is  the  application  of  AR  to  medical 
interventions  with  ultrasound  guidance  [15].  In  this  trial, 
a  doctor  performed  ultrasound-guided  needle  biopsies  with 
and  without  the  assistance  of  an  AR  system  that  had  been 
designed  for  the  task.  A  second  physician  evaluated  the  nee¬ 
dle  placement  of  the  first.  The  analysis  showed  that  needle 
localization  was  improved  when  using  the  AR  system.  The 
performance  metrics  in  this  trial  were  the  standard  for  evalu¬ 
ating  doctors’  performance  used  by  medical  schools:  needle 
placement  at  various  locations  within  the  target  lesion.  The 
physician  uses  the  ultrasound  to  determine  the  ideal  and  ac¬ 
tual  needle  locations.  Thus  the  measure  is  tightly  connected 


to  the  task,  and  in  fact  exists  prior  to  the  development  of  the 
AR  system. 

3  Experiment 

As  noted  above,  we  have  begun  our  performance  mea¬ 
surements  with  the  subsystem  that  depicts  occluded  sur¬ 
faces.  The  first  test  we  performed  was  a  perceptual  exper¬ 
iment  to  determine  whether  the  system  provides  sufficient 
information  for  the  user  to  understand  three  layers  of  depth 
among  large  objects  that  are  occluded  from  view. 

3.1  Design  Methodology 

From  our  initial  design  work  and  review  by  colleagues, 
we  selected  three  graphical  parameters  to  vary  in  our  rep¬ 
resentations:  drawing  style,  opacity,  and  intensity.  These 
comprised  a  critical  yet  tenable  set  of  parameters  for  our 
study.  We  used  an  urban  environment  that  fit  our  laboratory 
facilities.  By  sitting  in  the  atrium  of  our  building,  a  user 
could  wear  an  indoor-based  version  of  our  system  (which 
is  more  powerful  than  the  current  mobile  prototypes).  The 
environment  included  one  physically  visible  building  and 
two  occluded  buildings.  Among  the  two  occluded  build¬ 
ings  we  placed  one  target  to  locate  in  one  of  three  different 
positions:  closer  than  the  two  occluded  buildings,  between 
the  two,  or  behind  both.  This  introduced  the  question  of 
whether  the  ground  plane  (i.e.  perspective)  would  provide 
the  only  cue  that  users  would  actually  use.  Because  our  ap¬ 
plication  may  require  users  to  visualize  objects  that  are  not 
on  the  ground  or  are  at  a  great  distance  across  hilly  terrain, 
we  added  the  use  of  a  consistent,  flat  ground  plane  for  all 
objects  as  a  parameter. 

3.2  Hardware 

The  hardware  for  our  AR  platform  consisted  of  three 
components.  For  the  image  generator,  we  used  a  Pen¬ 
tium  IV  1 .7  GHz  computer  with  an  ATI  FireGF2  graphics 
card  (outputting  frame-sequential  stereo).  For  the  display 
device,  we  used  a  Sony  Glasstron  FDI-IOOB  stereo  opti¬ 
cal  see-through  display  (SVGA  resolution,  20°  horizontal 
field  of  view  in  each  eye).  The  user  was  seated  indoors 
for  the  experiment  and  was  allowed  to  move  and  turn  the 
head  and  upper  body  freely  while  viewing  the  scene,  which 
was  visible  through  an  open  doorway  to  the  outdoors.  We 
used  an  InterSense  IS-900  6-DOF  ultrasonic/inertial  hybrid 
tracking  system  to  track  the  user’s  head  motion  to  provide  a 
consistent  3D  location  for  the  objects  as  the  user  viewed  the 
world.  The  IS-900  provides  position  accuracy  to  3.0  mm 
and  orientation  accuracy  to  1.0°. 

The  user  entered  a  choice  for  each  trial  on  a  standard 
extended  keyboard,  which  was  placed  on  a  stand  in  front 
of  the  seat  at  a  comfortable  distance.  The  display  device, 
whose  transparency  can  be  adjusted  in  hardware,  was  set 


for  maximum  opacity  of  the  LCD,  to  counteract  the  bright 
sunlight  that  was  present  for  most  trials.  Some  trials  did 
experience  a  mix  of  sunshine  and  cloudiness,  but  the  opacity 
setting  was  not  altered.  The  display  brightness  was  set  to  the 
maximum. 

The  display  unfortunately  does  not  permit  adjustment  of 
the  inter-pupillary  distance  (IPD)  for  each  user.  If  IPD  is 
too  small,  then  the  user  will  be  seeing  slightly  cross-eyed 
and  tend  to  believe  objects  are  closer  than  they  are.  The  dis¬ 
play  also  does  not  permit  adjusting  the  focal  distance  of  the 
graphics.  The  focal  distance  of  the  virtual  objects  is  there¬ 
fore  closer  than  the  real  object  that  we  used  as  the  closest 
obstruction.  This  would  tend  to  lead  users  to  believe  the 
virtual  objects  were  closer  than  they  really  were. 

Stereo  is  considered  a  powerful  depth  cue  at  near-held 
distances  (approximately  1.0  meters,  or  about  at  arm’s 
length).  At  far-held  distances,  such  as  the  task  we  gave 
our  users,  stereo  is  not  considered  to  be  a  strong  depth  cue; 
however,  we  wanted  to  be  able  to  provide  some  statistical 
evidence  for  this  claim.  Many  practitioners  of  AR  systems 
have  noted  that  improper  settings  of  parameters  related  to 
stereo  imagery  (such  as  IPD  and  vergence)  can  lead  to  user 
discomfort  in  the  form  of  headaches  or  dizziness.  None  of 
users  reported  any  such  problems;  they  wore  the  device  for 
an  average  of  30  minutes.  These  issues  will  need  to  be  ad¬ 
dressed  in  future  versions  of  the  hardware  for  AR  systems, 
but  are  beyond  the  scope  of  our  work. 

3.3  Experimental  Design 

3.3.1  Independent  Variables 

From  our  heuristic  evaluation  and  from  previous  work,  we 
identihed  the  following  independent  variables  for  our  exper¬ 
iment.  These  were  all  within-subject  variables;  every  user 
saw  every  level  of  each  variable. 

Drawing  Style  (“wire”,  “hll”,  “wire-rhll”):  Although  the 
same  geometry  was  visible  in  each  stimulus  (except  for 
which  target  was  shown),  the  representation  of  that  geom¬ 
etry  was  changed  to  determine  what  effect  it  had  on  depth 
perception.  We  used  three  drawing  styles  (Figure  2).  In 
the  first,  all  objects  are  drawn  as  wireframe  outlines.  In 
the  second,  the  first  (physically  visible)  object  is  drawn  as  a 
wireframe  outline,  and  all  other  objects  are  drawn  with  solid 
fill  (with  no  wireframe  outline).  In  the  third  style,  the  first 
object  is  in  wireframe,  and  all  other  layers  are  drawn  with 
solid  fill  with  a  white  wireframe  outline.  Backface  culling 
was  on  for  all  drawing  styles,  so  that  the  user  saw  only  two 
faces  of  any  occluded  building. 

Opacity  (constant,  decreasing):  We  designed  two  sets  of 
values  for  the  a  channel  based  on  the  number  of  occluding 
objects.  In  the  “constant”  style,  the  first  layer  (visible  with 
registered  wireframe  outline)  is  completely  opaque,  and  all 
other  layers  have  the  same  opacity  {a  =  0.5).  In  the  “de¬ 
creasing”  style,  opacity  changes  for  each  layer.  The  first 


Figure  3.  The  experimental  design  (not  to  scale) 
shows  the  user  position  at  the  left.  Obstruction  1 
denotes  the  visible  surfaces  of  the  physically  vis¬ 
ible  building.  The  distance  from  the  user  to  ob¬ 
struction  1  is  approximateiy  60  meters.  The  dis¬ 
tance  from  the  user  to  target  iocation  3  is  approx¬ 
imately  500  meters,  with  the  obstructions  and  tar¬ 
get  locations  roughiy  equaiiy  spaced. 


(physically  visible,  wireframe)  layer  is  completely  opaque. 
The  successive  layers  are  not  opaque;  the  a  values  were  0.6, 
0.5,  and  0.4  for  the  successively  more  distant  layers. 

Intensity  (constant,  decreasing):  We  used  two  sets  of  in¬ 
tensity  modulation  values.  The  modulation  value  was  ap¬ 
plied  to  the  object  color  (in  each  color  channel,  but  not  in  the 
opacity  or  a  channel)  for  the  object  in  the  layer  for  which  it 
was  specified.  In  the  “constant”  style,  the  first  layer  (visible 
with  registered  wireframe  outline)  has  full  intensity  (modu- 
lator=l  .0)  and  all  other  layers  have  intensity  modulator=0.5. 
In  the  “decreasing”  style,  the  first  layer  has  its  full  native  in¬ 
tensity,  but  successive  layers  are  modulated  as  a  function  of 
occluding  layers:  0.75  for  the  first,  0.50  for  the  second,  and 
0.25  for  the  third  (final)  layer. 

Target  Position  (close,  middle,  far):  As  shown  in  the 
overhead  map  view  (Figure  3),  there  were  three  possible 
locations  for  the  target. 

Ground  Plane  (on,  off):  From  the  literature  and  every¬ 
day  experience,  we  know  that  the  perspective  effects  of  the 
ground  plane  rising  to  meet  the  horizon  and  apparent  object 
size  are  a  strong  depth  cues.  In  order  to  test  the  representa¬ 
tions  as  an  aide  to  depth  ordering,  we  removed  the  ground 
plane  constraint  in  half  of  the  trials.  The  building  sizes  were 
chosen  to  have  the  same  apparent  size  from  the  users’  loca¬ 
tion  for  all  trials.  When  the  ground  plane  constraint  was 
not  present  in  the  stimulus,  the  silhouette  of  each  target  was 
fixed  for  a  given  pose  of  the  user.  In  other  words,  targets 
two  and  three  were  not  only  scaled  (to  yield  the  same  ap¬ 
parent  size)  but  also  positioned  vertically  such  that  all  three 
targets  would  occupy  the  same  pixels  on  the  2D  screen  for 
the  same  viewing  position  and  orientation.  No  variation  in 
position  with  respect  to  the  two  horizontal  dimensions  was 
necessary  when  changing  from  using  the  ground  plane  to 
not  using  it.  The  obstructions  were  always  presented  with 
the  same  ground  plane.  We  informed  the  users  for  which 


Figure  2.  User’s  view  of  the  stimuii.  Left,  “wire”  drawing  styie.  Center-,  “fili”  drawing  styie.  Right,  “wire+fili” 
drawing  styie.  The  target  (smaliest,  most  centrai  box)  is  between  (position  “middle”)  obstructions  2  and  3  in  all 
three  pictures.  These  pictures  were  acquired  by  placing  a  camera  to  the  eyepiece  of  the  HMD,  which  accounts 
for  the  poor  image  quality.  The  vignetting  and  distortion  are  due  to  the  camera  lens  and  the  fact  that  it  does  not 
quite  fit  in  the  exit  pupil  of  the  HMD’s  optics. 


half  of  the  session  the  ground  plane  would  be  consistent  be¬ 
tween  targets  and  obstructions. 

We  did  this  because  we  wanted  to  remove  the  effects  of 
perspective  from  the  study.  Our  application  requires  that  we 
be  able  to  visualize  objects  that  may  not  be  on  the  ground, 
may  be  at  a  distance  and  size  that  realistic  apparent  size 
would  be  too  small  to  discern,  and  may  be  viewed  over  hilly 
terrain.  Since  our  users  may  not  be  able  to  rely  on  these 
effects,  we  attempted  to  remove  them  from  the  study. 

Stereo  (on,  off);  The  Sony  Glasstron  display  receives  as 
input  left-  and  right-eye  images.  The  IPD  and  vergence  an¬ 
gle  are  not  adjustable,  so  we  can  not  provide  a  true  stereo 
image  for  all  users.  However,  we  can  present  images  with 
disparity  (which  we  call  “stereo”  for  the  experiment)  or 
present  two  identical  images  (“biocular”). 

Repetition  (1,  2,  3):  Each  user  saw  three  repetitions  of 
each  combination  of  the  other  independent  variables.  It 
is  well-known  that  users  will  often  improve  their  perfor¬ 
mance  with  repetition  of  the  same  stimulus  within  an  ex¬ 
periment.  By  repeating  the  stimuli,  we  can  gain  some  in¬ 
sight  into  whether  the  user  needs  to  learn  how  the  system 
presents  cues  or  whether  the  system  presents  intuitive  cues. 
If  there  is  no  learning  effect  with  repetition  of  stimuli,  then 
we  can  infer  that  the  users  had  whatever  collective  perfor¬ 
mance  they  achieved  intuitively. 

3.3.2  Dependent  Variables 

For  each  trial,  we  recorded  the  user’s  (three-alternative 
forced)  choice  for  the  target  location  and  the  time  the  user 
took  to  enter  the  response  after  the  software  presented  the 
stimulus.  We  opted  to  ask  the  user  only  to  identify  the  or¬ 
dinal  depth,  not  an  absolute  distance  between  the  graphical 
layers.  This  implied  the  forced-choice  design. 

All  combinations  of  these  parameters  were  encountered 
by  each  user;  however,  the  order  in  which  these  were  pre¬ 
sented  was  also  randomly  permuted.  Thus  each  user  viewed 
432  trials.  The  users  ranged  in  time  from  twenty  to  forty 
minutes  for  the  complete  set  of  trials.  The  users  were  told 


1  sv  =  systemically  varied,  ^  rp  =  randomly  permuted 


Figure  4.  Experimental  design  and  counterbalanc¬ 
ing  for  one  user.  Systematically  varied  parameters 
were  counterbalanced  between  subjects. 


to  make  their  best  guess  upon  viewing  the  trial  and  not  to 
linger;  however,  no  time  limit  per  trial  was  enforced.  The 
users  were  instructed  to  aim  for  a  balance  of  accuracy  and 
speed,  rather  than  favoring  one  over  the  other. 

3.3.3  Counterbalancing 

In  order  to  reduce  time-based  confounding  factors,  we 
counterbalanced  the  stimuli.  This  helps  control  learning  and 
fatigue  effects  within  each  user’s  trials  and  factors  such  as 
the  amount  of  sunshine  that  change  between  subjects  be¬ 
yond  our  control.  Figure  4  describes  how  we  counterbal¬ 
anced  the  stimuli.  We  observed  (in  conjunction  with  many 
previous  authors)  that  the  most  noticeable  variable  was  the 
presence  of  the  ground  plane  [3,  18].  In  order  to  minimize 
potentially  confusing  large-scale  visual  changes,  we  gave 
ground  plane  and  stereo  the  slowest  variation.  Following 
this  logic,  we  next  varied  the  parameters  which  controlled 
the  scene’s  visual  appearance  (drawing  style,  alpha,  and  in- 


tensity),  and  within  the  resulting  blocks,  we  created  nine 
trials  by  varying  target  position  and  repetition. 

3.4  Experimental  Task 

We  designed  a  small  virtual  world  that  consisted  of  four 
buildings  (Figure  3),  with  three  potential  target  locations. 
The  first  building  was  an  obstruction  that  corresponded  (to 
the  limit  of  our  modeling  accuracy)  to  a  building  that  was 
physically  visible  during  the  experiment.  The  obstructions 
were  always  drawn  in  blue;  the  target  always  appeared  in 
red.  The  target  was  scaled  such  that  its  apparent  2D  size 
was  equal,  regardless  of  its  location.  Obstructions  2  and  3 
roughly  corresponded  to  real  buildings.  The  three  possible 
target  locations  did  not  correspond  to  real  buildings. 

The  task  for  each  trial  was  to  determine  the  location  of 
the  target  that  was  drawn.  The  user  was  shown  the  overhead 
view  before  beginning  the  experiment.  This  helped  them 
visualize  their  choices  and  would  be  an  aide  available  in  a 
working  application  of  our  system.  The  experimenter  ex¬ 
plained  that  only  one  target  would  appear  at  a  time.  Thus  in 
all  of  the  stimulus  pictures,  four  objects  were  visible;  three 
obstructions  and  the  target.  For  the  trials,  users  were  in¬ 
structed  to  use  the  number  pad  of  a  standard  extended  key¬ 
board  and  press  a  key  in  the  bottom  row  of  numbers  (1-3) 
if  the  target  were  closer  than  obstructions  2  and  3,  a  key 
in  the  middle  row  (4-6)  if  the  target  were  between  obstruc¬ 
tions  2  and  3,  or  a  key  in  the  top  row  (7-9)  if  the  target  were 
further  than  obstructions  2  and  3.  A  one-second  delay  was 
introduced  between  trials  within  sets,  and  a  rest  period  was 
allowed  between  sets  for  as  long  as  the  user  wished.  We 
showed  the  user  48  sets  of  nine  trials  each.  The  users  re¬ 
ported  no  difficulties  with  the  primitive  interface  after  their 
respective  practice  sessions.  The  users  did  not  try  to  use 
head  motion  to  provide  parallax,  which  is  not  surprising  for 
a  far-held  visualization  task. 

3.5  Subjects 

Eight  users  completed  the  experiment  (432  trials  each). 
All  subjects  were  male  and  ranged  in  age  from  20  to  48.  All 
volunteered  and  received  no  compensation.  Our  subjects  re¬ 
ported  being  heavy  computer  users.  Two  were  familiar  with 
computer  graphics,  but  none  had  seen  our  representations. 
Subjects  did  not  have  difficulty  learning  or  completing  the 
experiment. 

Before  the  experiment,  we  asked  users  to  complete  a 
stereo  acuity  test,  in  case  stereo  had  produced  an  effect.  The 
test  pattern  consisted  of  nine  shapes  containing  four  circles 
each.  For  each  set  of  four  circles,  the  user  was  asked  to 
identify  which  circle  was  closer  than  the  other  three.  Seven 
users  answered  all  nine  test  questions  correctly,  while  the 
other  user  answered  eight  correctly. 


4  Hypotheses 

We  made  the  following  hypotheses  about  our  indepen¬ 
dent  variables. 

1.  The  ground  plane  would  have  a  strong  positive  effect 
on  the  user’s  perception  of  the  relative  depth. 

2.  The  wireframe  representation  (our  system’s  only  op¬ 
tion  before  this  study)  would  have  a  strong  negative 
effect  on  the  user’s  perception. 

3.  Stereo  imagery  would  not  yield  different  results  than 
biocular  imagery,  since  all  objects  are  in  the  far- 
held  [3]. 

4.  Decreasing  intensity  would  have  a  strong  positive  ef¬ 
fect  on  the  user’s  perception  for  all  representations. 

5.  Decreasing  opacity  would  have  a  strong  positive  effect 
on  the  user’s  perception  of  the  “hll”  and  “wire-thll” 
representations.  In  the  case  of  wireframe  representa¬ 
tion  the  effect  would  be  similar  to  decreasing  inten¬ 
sity.  Apart  from  the  few  pixels  where  lines  actually 
cross,  decreasing  opacity  would  let  more  and  more  of 
the  background  scene  shine  through,  thereby  indirectly 
leading  to  decreased  intensity. 

5  Results 

There  are  a  number  of  error  metrics  we  apply  to  the  ex¬ 
perimental  data.  Figure  5  categorizes  the  user  responses. 
Subjects  made  79%  correct  choices  and  21%  erroneous 
choices.  We  found  that  subjects  favored  the  far  position, 
choosing  it  39%  of  the  time,  followed  by  the  middle  posi¬ 
tion  (34%),  and  then  by  the  close  position  (27%).  We  also 
found  that  subjects  were  the  most  accurate  in  the  far  posi¬ 
tion;  89%  of  their  choices  were  correct  when  the  target  was 
in  the  far  position,  as  compared  to  76%  correct  in  the  close 
position,  and  72%  correct  in  the  middle  position. 

As  discussed  above,  we  measured  two  dependent  vari¬ 
ables;  user  response  time  and  user  error.  For  user  response 
time,  the  system  measured  the  time  in  milliseconds  (ms)  be¬ 
tween  when  it  drew  the  scene  and  when  the  user  responded. 
Response  time  is  an  interesting  metric  because  it  indicates 
how  intuitive  the  representations  are  to  the  user.  We  want 
the  system  to  convey  information  as  naturally  as  the  user’s 
vision  does  in  analogous  real-world  situations. 

For  user  error,  we  calculated  the  metric  e  =  |a  —  tt|, 
where  a  is  the  actual  target  position  (between  1  and  3)  and 
u  is  the  target  position  chosen  by  the  user  (also  between  1 
and  3).  Thus,  if  e  =  0  the  user  has  chosen  the  correct  target; 
if  e  =  1  the  user  is  off  by  one  position,  and  if  e  =  2  the  user 
is  off  by  two  positions. 

We  conducted  significance  testing  for  both  response 
time  and  user  error  with  a  standard  analysis  of  variance 
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Figure  5.  User  responses  by  target  position.  For 
each  target  position,  the  bars  show  the  number 
of  times  subjects  chose  the  (C)iose,  (M)iddle,  and 
(F)ar  positions.  Subjects  were  either  correct  when 
their  choice  matched  the  target  position  (white),  off 
by  one  position  (light  gray),  or  off  by  two  positions 
(dark  gray). 


(ANOVA)  procedure.  In  the  summary  below,  we  report  user 
errors  in  positions  (pos). 

We  briefly  discuss  the  factors  that  affected  user  perfor¬ 
mance.  As  we  expected,  subjects  were  more  accurate  when 
a  ground  plane  was  present  (.1435  pos)  then  when  it  was 
absent  (.3056  pos).  Interestingly,  there  was  no  effect  of 
ground  plane  on  response  time  {F  <  1).  This  indicates 
that  subjects  did  not  learn  to  just  look  at  the  ground  plane 
and  immediately  respond  from  that  cue  alone,  but  were  in 
fact  also  attending  to  the  graphics. 

Figure  6  shows  that  subjects  were  slower  using  the 
“wire”  style  than  the  “flll”  and  “wire-nflU”  styles.  Subjects 
had  the  fewest  errors  with  the  “wire-nflll”  style.  These  re¬ 
sults  verified  our  hypotheses  that  the  “wire”  style  would 
not  be  very  effective,  and  the  “wire-tflll”  style  would  be  the 
most  effective,  since  it  combines  the  occlusion  properties 
of  the  “flll”  style  with  the  wireframe  outlines,  which  help 
convey  the  targets’  shapes. 

Subjects  were  more  accurate  with  decreasing  opacity 
(.1962  pos)  than  with  constant  opacity  (.2529  pos).  This 
makes  sense  because  the  decreasing  opacity  setting  made 
the  difference  between  the  layers  more  salient.  Subjects 
were  both  faster  (2340  versus  2592  ms)  and  more  accurate 
(.1811  versus  .2679  pos)  with  decreasing  intensity.  This  re¬ 
sult  was  expected,  as  decreasing  intensity  did  a  better  job 
of  differentiating  the  different  layers.  However,  Figure  7 
shows  that  the  effect  on  response  time  is  due  to  the  differ¬ 
ence  between  constant  and  decreasing  intensity  when  the 
target  is  drawn  in  the  “wire”  style. 

As  expected  from  training  effects,  subjects  became  faster 
with  repetition.  However,  repetition  had  no  effect  on  abso¬ 
lute  error  {F  <  1),  so  although  subjects  became  faster,  they 


Figure  6.  Main  effect  of  drawing  style  on  response 
time  (□)  and  error  (o). 


Drawing  Style 

Figure  7.  Drawing  styie  by  intensity  (constant  (□), 
decreasing  (o))  interaction  on  response  time. 


did  not  become  more  accurate.  This  can  be  taken  as  a  sign 
that  the  presented  visuals  were  understandable  for  the  sub¬ 
jects  right  from  the  outset.  No  learning  effect  took  place 
regarding  accuracy.  Subjects  became  faster,  though,  which 
is  a  sign  that  their  level  of  confidence  increased. 

6  Discussion 

In  a  broad  context,  we  believe  that  our  methodology  will 
enable  us  to  evaluate  both  system  capabilities  and  user  per¬ 
formance  with  the  system.  Human  perception  is  an  innate 
ability,  and  variations  in  performance  will  reflect  the  sys¬ 
tem’s  appropriateness  for  use  by  dismounted  warfighters. 
Thus,  we  are  really  evaluating  the  system’s  performance  by 
measuring  the  user’s  performance  on  perceptual-level  tasks. 
The  evaluation  of  cognitive-level  tasks  will  enable  us  to  de¬ 
termine  how  users  are  performing.  Such  high-level  metrics 
can  only  be  measured  after  the  results  of  the  perceptual- 


Drawing  Style 

Figure  8.  Drawing  style  by  intensity  (constant  (□), 
decreasing  (o))  interaction  on  absolute  error. 


level  tests  inform  the  system  design. 

Our  first  experiment  has  given  insight  into  how  users  per¬ 
ceive  data  presented  in  the  system.  The  application  of  our 
results  to  human  perception  and  thus  our  system  design  are 
straightforward.  It  is  well-known  that  a  consistent  ground 
plane  (a  perspective  constraint)  is  a  powerful  depth  cue. 
However,  we  can  now  provide  statistical  backing  for  our 
fundamental  hypothesis  that  graphical  parameters  can  pro¬ 
vide  strong  depth  cues,  albeit  not  physically  realistic  cues. 
We  found  that  with  the  ground  plane  on  the  average  error 
was  .144  pos,  whereas  the  with  the  ground  plane  off  and  the 
following  settings: 

•  drawing  style:  “wire-rfiH” 

•  opacity:  decreasing 

•  intensity:  decreasing 

the  average  error  was  .111  pos.  The  data  thus  suggest  that 
we  did  find  a  set  of  graphical  parameters  as  powerful  as  the 
presence  of  the  ground  plane  constraint.  This  would  indeed 
be  a  powerful  statement,  but  requires  further  testing  before 
we  can  say  for  sure  whether  this  is  our  finding.  As  a  sec¬ 
ondary  result,  the  fact  that  there  was  a  main  effect  of  rep¬ 
etition  on  response  time  but  not  on  accuracy  indicates  that 
the  subjects  could  quickly  understand  the  semantic  meaning 
of  the  encodings.  This  validates  that  BARS  is  performing 
at  a  level  that  is  sufficient  for  users  to  consistently  (but  not 
always)  identify  the  ordinal  depth  among  three  occluded  ob¬ 
jects. 

There  are  several  next  steps  available  to  us.  Further 
perceptual-level  testing  will  demonstrate  whether  these  re¬ 
sults  extend  to  more  complex  scenes  (with  more  layers  of 
depth).  We  are  currently  designing  a  follow-up  study  that 
will  use  not  just  an  ordinal  depth  metric,  but  an  absolute 
distance  metric.  This  study  will  task  the  user  to  move  a  vir¬ 
tual  object  into  depth  alignment  with  real  objects.  We  are 


developing  metrics  to  apply  to  the  user’s  control  of  the  ob¬ 
ject,  such  as  the  number  of  oscillations  they  use  to  place  the 
object  into  position,  that  will  give  us  insight  into  their  con¬ 
fidence  in  the  depth  estimates  they  perceive  through  BARS. 
We  are  also  considering  ways  in  which  to  measure  the  user’s 
subjective  reaction  to  the  system,  as  this  is  also  an  important 
aspect  of  the  system’s  capabilities. 

Once  these  results  inform  our  future  system  design,  we 
will  move  up  to  cognitive-level  testing,  in  which  we  hope 
to  have  multiple  users  wear  prototype  systems  in  an  urban 
environment.  We  can  have  users  identify  locations  of  ob¬ 
jects  relative  to  maps  or  to  each  other.  We  could  have  users 
retrieve  objects  from  the  environment.  The  metrics  we  plan 
to  use  will  reflect  the  cognition  required.  Distance  and  re¬ 
sponse  time  will  remain  interesting  measures,  but  now  the 
absolute  distance  will  become  more  important.  We  will  be 
able  to  add  directional  measures  as  well,  concomitant  with 
the  increased  complexity  of  the  task  for  a  mobile  user.  Since 
our  application  is  designed  for  a  military  context,  we  in¬ 
tend  to  design  our  cognitive-level  tests  in  conjunction  with 
military  domain  experts  and  have  at  least  some  of  the  sub¬ 
jects  in  our  studies  be  active  members  of  the  military.  This 
introduces  the  opportunity  to  measure  system  performance 
by  comparing  against  current  performance  of  dismounted 
warfighters  in  these  tasks.  This  combined  design  and  evalu¬ 
ation  methodology  will  enable  us  to  evaluate  the  Battlefield 
Augmented  Reality  System  and  its  users. 
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